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Recently quantum prediction problem was proposed in the Bayesian framework [J. It is shown 
that Bayesian predictive density operators are the best predictive density operators when we evaluate 
them by using the average relative entropy based on a prior. As an illustrative example, we treat 
the Gaussian states family adopting the Gaussian distribution as a prior and give the Bayesian 
predictive density operator with the heterodyne measurement fixed. We show that it is better than 
the plug-in predictive density operator based on the maximum likelihood estimate by calculating 
each average relative entropy. 
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In quantum statistics, problems of statistical inference 
and state estimation has received a lot of attention over 
the past several years with recent developments of ex- 
perimental techniques. Historically speaking, parameter 
estimation problem on quantum systems dates back to 
a quarter century, when Hclstrom, Holevo, and other re- 
searchers vigorously investigated the topic and gave some 
extension of mathematical statistical concepts on classi- 
cal probability. 

Bayesian approach for quantum statistics has also been 
investigated [2, 0- Jones has derived a quantum 
Bayes rule for pure states with the uniform prior. Later, 
Buzek et al. |5j pointed out that it can be applied to 
mixed states with purification ansatz. Schack et al. [|J 
extended his result to a more general framework of ex- 
changeable states. They showed that a quantum state 
after a measurement can be interpreted as the state av- 
eraged over the posterior. Buzek et al. Q recommended 
to use Bayesian technique especially when the sample size 
of experimental data is small. They proposed to use a 
posterior state corresponding to a posterior distribution 
in classical counterparts. 

From the viewpoints of information quantity and Bayes 
rule, however, Bayesian estimation on quantum states 
has not been fully discussed. Performances of the 
Bayesian approach compared with other approach such 
as the maximal likelihood method have not been dis- 
cussed theoretically. Tanaka and Komaki showed that 
the Bayesian method has better performance than the 
plug-in method when exchangeable states are consid- 
ered In the present paper, we review it and calculate 
the Bayesian predictive density operator for the Gaussian 
states family with the heterodyne measurement. 



We briefly summarize some notations of quantum mea- 
surement. Let H be a separable (possibly infinite dimen- 
sional) Hilbert space of a quantum system. An Hermitian 
operator p on Ti. is called a state or density operator if it 
satisfies, 

Trp =1, p > 0. 

We denote the set of all states on Ti as S(H). 

Let f2 be a space of all possible outcomes of an ex- 
periment (e.g., = R") and suppose that a c-algebra 
B := B(Vl) of subsets of Q, is given. An affine map p from 
S(TL) into a set of probability distributions on Q, V= 
{p(dx)} is called a measurement. There is a one-to-one 
correspondence between a measurement and a resolution 
of the identity Q . A map from B into the set of positive 
Hermitian operators 



M : B h-> M(B), 



where M satisfies 



M(0) = O, M(fi) = /, 

Af (Uifl() = J2 M ( B i)> B * B 3 = 
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is called a positive operator valued measure (POVM). 
Any physical measurement can be represented by a 
POVM. 

Now we describe our setting of state estimation. As- 
sume that a state pe on Ti. is characterized by an unknown 
finite-dimensional parameter 9 E 6 C R p . 

A quantum state for n systems, p^ n \ is described on 
the rt-fold tensor product Hilbert space 7i® n . Suppose 
that a system composed of n+m subsystems is given and 
that a measurement is performed only for selected n sub- 
systems with the other m subsystems left. Then, the 
measurement is described by {M X ®I}, where {M^} is a 
POVM on H® n and I is the identity operator on H® m . 
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Our aim is to estimate the true state ag := pf m of the 
remaining m subsystems by using a measurement {M x } 
on the selected n subsystems pf n . We fix an arbitrarily 
chosen measurement. Note that this measurement is not 
necessarily in the form of a tensor product Mf n , which 
represents a repetition of the same measurement M x for 
each system. Thus, all possible measurements on n sub- 
systems, which may use entanglement, are considered. 

The performance of a predictive density operator a{x) 
is evaluated by the relative entropy D(ag\\a(x)), a quan- 
tum analogue of the Kullback-Leibler divergence in clas- 
sical statistics. The quantum relative entropy from p to 
a is defined by 



D{p\\a) :=Tr[p(logp-lo g( x)]. 



(1) 



It satisfies the positivity condition D(p||cr) > and 
Z?(p||cr) = p = a. Thus, it can be used as a measure 
for the goodness of state estimation. 

There are mainly two approaches on inference of state 
ag for the parametric model above. One approach is to 
use 0§t x y where 9(x) is an estimator of 9, depending 
on the observation x. The other approach corresponds 
to the Bayesian predictive density approach in classical 
statistics 0,13. We shall briefly review the idea. First, 
we assume a probability density tt(0) on the parameter 
space. In mathematical statistics ir(9) is usually called 
a prior density. When there is no knowledge about pa- 
rameter 8, which is often called noninformative, several 
people have discussed what kind of prior should be used 
13 1 0- From the data x obtained from a measurement 
{Mx}, a posterior distribution it{8\x) is constructed as 



tt(6\x) := 



P m (x\8)tt(9) 
Jd9 p M {x\9)it{9)' 



where p M (x\9) = TYpf n M x . Next, taking an average of 
ag with tt(9\x), one can obtain the Bayesian estimator 



0v(z) 



d9 o~gir( 



We call this state estimator, as in classical statistics, a 
Bayesian predictive density operator. In order to distin- 
guish two estimators we call ag, an estimator based on 
9, a plug-in predictive density operator. 

If we assume a prior probability density ir(9) on the 
parameter space O, the mixture state is given by 



» 



A9 n(9) p% 



(2) 



A state of the form (|5J is called an exchangeable state Q , 
and arises, e.g., if each subsystem is prepared in the same 
unknown way, as in quantum state tomography. In a 
quantum exchangeable model @, as Schack et al. 
showed, a posterior distribution 7r(6*|a;) naturally arises. 

Tanaka and Komaki show that Bayesian predictive 
density operators are better than plug-in predictive den- 
sity operators 



Theorem. 1 

Suppose that we perform a measurement for selected n 
subsystems p^of a system p®(™+ m ) composed of n + m 
subsystems in order to estimate the remaining m sub- 
systems ag — pf m . The true parameter value 9 is un- 
known and a prior probability density ir(9) is assumed. 
Let a(x) be any predictive density operator, where x is 
an outcome of a measurement {M x } for the n subsys- 
tems. Performance of a predictive density operator a{x) 
is measured with the average relative entropy 

E"E M [D(ag\\a(x))]= [ d9n(6)[dxp M (x\9)D(a e \\a(x)) 



from the true state ag. Then, the Bayesian predic- 
tive density operator a 7r (x) based on the observation x 
and the prior tt(9) is the best predictive density operator. 

Remark. 

In classical statistics, Aitchison ,9] showed that the 
Bayesian predictive density p^{y\x) has better perfor- 
mance under the Kullback-Leibler divergence than any 
plug-in predictive density p{y\9) when a proper prior tt(9) 
is given. Theorem 1 is the corresponding result for quan- 
tum predictive density operators. 

In different setting, Krattenthaler and Slater obtained 
a similar result as a quantum version of the Aitchison's 
result |lOj . While they consider a prior density ir(9) with 
respect to an unknown state, we consider a posterior den- 
sity tt(9\x) with respect to a post-measurement state. 



III. PREDICTION OF UNKNOWN GAUSSIAN 
STATE FROM ONE SAMPLE 

We consider the prediction problem of the Gaussian 
states family below (See, e.g., Holevo |3] for the Gaussian 
states family). 



M := { P 0M : 9 EC}, 
If f \a 



(3) 



nN 



exp 



N 



|a)(a|d 2 a 



and assuming that the photon expectation parameter 
N(> 0) is known. We omit N unless otherwise neces- 
sary. 

The parameter estimation problem of the model 
was investigated by Yuen and Lax and Holevo 
They obtain the Cramer-Rao type bound, i.e., the lower 
bound of the trace of the mean square error matrix with 
an arbitrary weight matrix, based on the RLD Fisher 
information matrix. They showed that the heterodyne 
measurement { ^ <a ^ Q } achieves the bound and it is opti- 
mal. This measurement is optimal also in an asymptotic 
sense, which was shown by Hayashi [T^| . 

Here, we consider the prediction problem in the 
Bayesian framework. Assume that unknown parameter 
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9 is distributed subject to 



7r(0) = 



1 



2ttt- 



exp 



2r 2 



(4) 



where £ € C,r 2 > are so-called hyperparameter. 

In this section we only consider n = m = 1 case for 
simplicity. General case for arbitrary n and m is consid- 
ered in the next section. When n = 1, it is natural to 
adopt the heterodyne measurement above. Then the esti- 
mator of 9 is given by 9(a) = a, where the measurement 
outcome a is distributed by 



a~p M (a\9) 



1 / n 

exp 



tt(N + 1) 



7V + l 



We calculate the average relative entropy for two pre- 
dictive density operator p^ and p*. Straightforward cal- 
culation yields 



Ptt{u) 



1 



7r(iV + 2A 2 )7 c eXP V iV + 2A 2 
where 



1/3- 



|/?)(/?|d 2 /?, 



(A 



2\-l . 



AT+ 1 



^(r 2 ) 



2\-l 



The average relative entropy for them is also obtained by 

'N + V 



n p := £^[£(p e ||p,)] = (Ar+l)log 



ft* := £W[D(pfl||ftr)] 

= log — h Af log 

& N + l S 

- (7V + 2A 2 )log- 



N 



log- 



iV 



1 



AT+1 b iV+2A 2 + l 
N+2A 2 



iV+2A 2 + l' 

where we used the formula for the Gaussian states family 



,'M+1\ , ( N M + l 
D{p CN \\p CM ) = \og[^ r -^)+Nlog 



log 



N + 1 
/M+l 



V M 

Since ft.* is monotone increasing with r 2 , 



AT + 1 M 

IC-C'I 2 - 



sup 1Z„ — lim ft* 

T 2 >0 T 2 ^OC 



loe 



1 



7V + 1 



TVlos 



N 



N+l 



-log 
ft*. 



1 



2N + 2 



(27V + 1) log 



27V + 1 
2N + 2 



In addition, from the straightforward calculation we can 
show ftp > ft* > ft*. Thus, it is shown that the 



Bayesian predictive density operator p* is better than 
the plug-in density operator p^ based on 9(a). 

Since the model M. is translation invariant, it seems 
natural to adopt the Lebesgue measure irj(9)d 2 9 cx d 2 6* 
as a noninformative prior. Although J irj(9)d 2 9 = oo, 
as classical statistics, various quantities are obtained by 
taking the limit t 2 — > oo. Since 2 A 2 = N + 1, Bayesian 
predictive density operator is given by 



1 



/V; = 



7r(2iV + l) 



exp 



1/3 -0? 
2N + 1 



1/3) (f3\d 2 P, 



and the average relative entropy is equal to ft*(< oo). 



IV. PREDICTION OF UNKNOWN GAUSSIAN 
STATE FROM n SAMPLE 

Now, we deal with more general case. Assume that the 
unknown n + m systems p^^ +m '* are prepared, where A^ 
is known and 9 is unknown and subject to the prior 

We fix the heterodyne measurement { I iH2A£i j } an( j 

perform it for arbitrarily chosen n systems. Then each 
data o.i £ C is independently subject to 



a i ^p M (a i \0) 



1 



tt(N + 1) 



exp 



AT + 1 



We consider the estimation of the remaining m systems, 
go '■= pf m from these data (at, . . . , a n ). Let us calculate 
the average risk for Gq — p^ m and G n (a). The plug- in 
density operator is given by 



I. 1 



where 



1 " 
a n := - }ai 
n * — ' 

i=l 

is a maximum likelihood estimator. On the other hand 
the Bayesian predictive density operator is given by 

« m 

G„(a)= d 2 f3 1 ---d 2 m &)\p j )(f3 j \p n (0\a), (6) 



where 
p„(P\a) = 
and 



7r(AT + 2mA 2 ) \wN J 



i y 1 - 1 ( i i N 

1 exp I --— — B 



2 A 2 



B:=p(\fh\ 2 + --- + Wm\ 2 )+q\9\ 2 -\p(p 1 + --- + (3 m ) + q9\ 2 , 



4 



N 



p := 



where 



(A 2 



(A2)-i+m(f)-i' y - (A2)-i+m(f)-i' 



0(a) 



n (jm ) -i + (T2) - 1 ' 



Since 

n 

and E 7r E M |6 7 (a) — 9\ 2 = 2A 2 l , 
each average relative entropy is obtained by 



K p := E*E M [D(a e \\a § )} = ^(N + l)\og(^±l 



K„ := E^E M [D(a e \\a w )] 

1 N 

lOg Ar ; - + AHog 



N + l ° N + l 

1 



log 



N + 2mA? + 1 



(N + 2mA 2 n ) log 



N + 2mA 2 
N + 2mA 2 , + 1 ' 



Again it is easily shown that 1Z P > lZ n for arbitrary hy- 
perparameter £ and r 2 > 0. 



V. CONCLUDING REMARKS 

Strictly speaking, the proof of theorem 1 is valid only 
for finite-dimensional cases (i.e., dimH < oo) 0. Thus, 
we only compare the plug-in predictive density opera- 
tor based on the maximum likelihood estimate 9 and 
the Bayesian predictive density operator and show that 
the latter is better than the former in the average rela- 
tive entropy. However, we expect that theorem 1 can be 
extended to infinite-dimensional cases under some regu- 
larity conditions such as the exchangeability of the or- 
der of Tr and J d6 tt(9) and integrability of p w (x) — 
J d9 ■n{9\x)pg. The quantum Gaussian states family is 
known to have good properties as the classical Gaussian 
family has |j. Therefore, it could be shown that the 
Bayesian predictive density operator is really the best 
predictive density under the prior Q. Such rigorous ar- 
guments is left for future study. 



APPENDIX A: CALCULATION OF THE 
FORMULA © 

In this section, we derive the formula J5J. First we re- 
view the notation and the mathematical description that 
we need to show the formula JSJ ■ For details of them and 
physical meaning, see, e.g., Walls and Milburn ^| . 

Recall that the projective unitary representation of the 
translation group on the complex plane is given by 



U e U„ 



U e+ri , V6\rye C, 



and the Gaussian state with mean parameter 9 is given 
by unitary transformation of this group, 

Pn,b = UePN,oUg, V6> € C, 

where U* denotes the adjoint operator of [/.(It is often 
denoted as U> in physics.) On the other hand, coherent 
state vector is defined in the following form, 



e 2 



\ - a" 



n , 



Here, \n) denotes n photon excited state, which is defined 
by 



(a*) n 



|0>, 



where a* is so called creation operator. Please note that 
(n|m) = S nm , n, m = 0, 1, . . . and 




to) x e 2 '"' 2 



Coherent state vector is a mathematical representation 
of the light of a certain frequency. 

Now we derive the formula. The key point is to calcu- 
late the following trace. 

TrpN,e log Pm , v = Trp N . g log(U v pM,oU*) 

= TrpN,eU v (logpM,o)U* 

= TrU*pN,eU v (logpM,o) 

= TrpN,e-nQogpM,o)- 



For simplicity, we calculate Trpjv,-r/(logpM,o)- Recall 
that the pn,o is diagonalized with the orthonromal basis 
{|n)}„= ,i,..., 



00 i / N x n+1 



Pn 



n=0 



N V N + 1 



n)(n\ 



We obtain the logarithm of this density operator. 



log Pm.o = ^2 lo S ' 



1 / M 



n=0 



M V M + 1 



n+l 



\n)(n\ 



5 



and the matrix element with coherent vector \a) is given 
by 



(a\ log pM,o\ a ) 

n=0 (. 



M 



M \M + 1 



n+l 



= S{ los ^ +nlog (^)} 



|(a|n)| 2 

M M (|a| 2 )" 



log 



1 



M + l 



|a| 2 log 



M+l 

M 



M + l 



Then, 



Trpjv.-T, log pm,o 



7 ' 

Jc 

X 4 lo; 



— exp --|a + ry|' 



1 



M + l 



log 



1 

M + l 



log 



M 2 iog 

M 
M + l 



71/ 



M+l 



{N 2 + ^} 



Using this formula, we obtain the relative entropy for- 
mula (5). 

-CKpat.sIIpa/^) 
= Tr-tpjv^logpAT^ - logpM.n)} 
= Tr{/?jv,o(logpjv,o)} - Tr{pjv,e-^(logpM,o)} 

N M + l" 



log „ + 1 -I- .V loo 



Ar + 1 



iV+ 1 M 



log 



A/ 



M + l 
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